Mapping the unknown: The spatially correlated multi-armed bandit

نویسندگان

  • Charley M. Wu
  • Eric Schulz
  • Maarten Speekenbrink
  • Jonathan D. Nelson
  • Björn Meder
چکیده

We introduce the spatially correlated multi-armed bandit as a task coupling function learning with the explorationexploitation trade-off. Participants interacted with bi-variate reward functions on a two-dimensional grid, with the goal of either gaining the largest average score or finding the largest payoff. By providing an opportunity to learn the underlying reward function through spatial correlations, we model to what extent people form beliefs about unexplored payoffs and how that guides search behavior. Participants adapted to assigned payoff conditions, performed better in smooth than in rough environments, and—surprisingly—sometimes performed equally well in short as in long search horizons. Our modeling results indicate a preference for local search options, which when accounted for, still suggests participants were best-described as forming local inferences about unexplored regions, combined with a search strategy that directly traded off between exploiting high expected rewards and exploring to reduce uncertainty about the spatial structure of rewards.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A quality assuring multi-armed bandit crowdsourcing mechanism with incentive compatible learning

We develop a novel multi-armed bandit (MAB) mechanism for the problem of selecting a subset of crowd workers to achieve an assured accuracy for each binary labelling task in a cost optimal way. This problem is challenging because workers have unknown qualities and strategic costs.

متن کامل

Gap-free Bounds for Stochastic Multi-Armed Bandit

We consider the stochastic multi-armed bandit problem with unknown horizon. We present a randomized decision strategy which is based on updating a probability distribution through a stochastic mirror descent type algorithm. We consider separately two assumptions: nonnegative losses or arbitrary losses with an exponential moment condition. We prove optimal (up to logarithmic factors) gap-free bo...

متن کامل

Asymptotically Optimal Multi-Armed Bandit Policies under a Cost Constraint

We develop asymptotically optimal policies for the multi armed bandit (MAB), problem, under a cost constraint. This model is applicable in situations where each sample (or activation) from a population (bandit) incurs a known bandit dependent cost. Successive samples from each population are iid random variables with unknown distribution. The objective is to have a feasible policy for deciding ...

متن کامل

Dynamic Pricing under Finite Space Demand Uncertainty: A Multi-Armed Bandit with Dependent Arms

We consider a dynamic pricing problem under unknown demand models. In this problem a seller offers prices to a stream of customers and observes either success or failure in each sale attempt. The underlying demand model is unknown to the seller and can take one of N possible forms. In this paper, we show that this problem can be formulated as a multi-armed bandit with dependent arms. We propose...

متن کامل

1 Asymptotic Bayes Analysis for the Finite Horizon One Armed Bandit Problem

The multi-armed bandit probem is often taken as a basic model for the tradeoff between the exploration utilization required for efficient optimization under uncertainty. In this paper we study the situation in which the unknown performance of a new bandit is to be evaluated and compared with that of a known one over a finite horizon. We assume that the bandits represent random variables with di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017